232 research outputs found

    Efficient genotype compression and analysis of large genetic-variation data sets

    Get PDF
    Genotype Query Tools (GQT) is an indexing strategy that expedites analyses of genome-variation data sets in Variant Call Format based on sample genotypes, phenotypes and relationships. GQT's compressed genotype index minimizes decompression for analysis, and its performance relative to that of existing methods improves with cohort size. We show substantial (up to 443-fold) gains in performance over existing methods and demonstrate GQT's utility for exploring massive data sets involving thousands to millions of genomes. GQT can be accessed at https://github.com/ryanlayer/gqt

    Patterns of genic intolerance of rare copy number variation in 59,898 human exomes.

    Get PDF
    Copy number variation (CNV) affecting protein-coding genes contributes substantially to human diversity and disease. Here we characterized the rates and properties of rare genic CNVs (<0.5% frequency) in exome sequencing data from nearly 60,000 individuals in the Exome Aggregation Consortium (ExAC) database. On average, individuals possessed 0.81 deleted and 1.75 duplicated genes, and most (70%) carried at least one rare genic CNV. For every gene, we empirically estimated an index of relative intolerance to CNVs that demonstrated moderate correlation with measures of genic constraint based on single-nucleotide variation (SNV) and was independently correlated with measures of evolutionary conservation. For individuals with schizophrenia, genes affected by CNVs were more intolerant than in controls. The ExAC CNV data constitute a critical component of an integrated database spanning the spectrum of human genetic variation, aiding in the interpretation of personal genomes as well as population-based disease studies. These data are freely available for download and visualization online

    Landscape of multi-nucleotide variants in 125,748 human exomes and 15,708 genomes.

    Get PDF
    Multi-nucleotide variants (MNVs), defined as two or more nearby variants existing on the same haplotype in an individual, are a clinically and biologically important class of genetic variation. However, existing tools typically do not accurately classify MNVs, and understanding of their mutational origins remains limited. Here, we systematically survey MNVs in 125,748 whole exomes and 15,708 whole genomes from the Genome Aggregation Database (gnomAD). We identify 1,792,248 MNVs across the genome with constituent variants falling within 2 bp distance of one another, including 18,756 variants with a novel combined effect on protein sequence. Finally, we estimate the relative impact of known mutational mechanisms - CpG deamination, replication error by polymerase zeta, and polymerase slippage at repeat junctions - on the generation of MNVs. Our results demonstrate the value of haplotype-aware variant annotation, and refine our understanding of genome-wide mutational mechanisms of MNVs

    The Escherichia coli transcriptome mostly consists of independently regulated modules

    Get PDF
    Underlying cellular responses is a transcriptional regulatory network (TRN) that modulates gene expression. A useful description of the TRN would decompose the transcriptome into targeted effects of individual transcriptional regulators. Here, we apply unsupervised machine learning to a diverse compendium of over 250 high-quality Escherichia coli RNA-seq datasets to identify 92 statistically independent signals that modulate the expression of specific gene sets. We show that 61 of these transcriptomic signals represent the effects of currently characterized transcriptional regulators. Condition-specific activation of signals is validated by exposure of E. coli to new environmental conditions. The resulting decomposition of the transcriptome provides: a mechanistic, systems-level, network-based explanation of responses to environmental and genetic perturbations; a guide to gene and regulator function discovery; and a basis for characterizing transcriptomic differences in multiple strains. Taken together, our results show that signal summation describes the composition of a model prokaryotic transcriptome

    Targeted genetic analysis in a large cohort of familial and sporadic cases of aneurysm or dissection of the thoracic aorta

    Get PDF
    PURPOSE: Thoracic aortic aneurysm/aortic dissection (TAAD) is a disorder with highly variable age of onset and phenotype. We sought to determine the prevalence of pathogenic variants in TAAD-associated genes in a mixed cohort of sporadic and familial TAAD patients and identify relevant genotype–phenotype relationships. METHODS: We used a targeted polymerase chain reaction and next-generation sequencing–based panel for genetic analysis of 15 TAAD-associated genes in 1,025 unrelated TAAD cases. RESULTS: We identified 49 pathogenic or likely pathogenic (P/LP) variants in 47 cases (4.9% of those successfully sequenced). Almost half of the variants were in nonsyndromic cases with no known family history of aortic disease. Twenty-five variants were within FBN1 and two patients were found to harbor two P/LP variants. Presence of a related syndrome, younger age at presentation, family history of aortic disease, and involvement of the ascending aorta increased the risk of carrying a P/LP variant. CONCLUSION: Given the poor prognosis of TAAD that is undiagnosed prior to acute rupture or dissection, genetic analysis of both familial and sporadic cases of TAAD will lead to new diagnoses, more informed management, and possibly reduced mortality through earlier, preclinical diagnosis in genetically determined cases and their family members

    MKLN1 splicing defect in dogs with lethal acrodermatitis

    Get PDF
    Lethal acrodermatitis (LAD) is a genodermatosis with monogenic autosomal recessive inheritance in Bull Terriers and Miniature Bull Terriers. The LAD phenotype is characterized by poor growth, immune deficiency, and skin lesions, especially at the paws. Utilizing a combination of genome wide association study and haplotype analysis, we mapped the LAD locus to a critical interval of similar to 1.11 Mb on chromosome 14. Whole genome sequencing of an LAD affected dog revealed a splice region variant in the MKLN1 gene that was not present in 191 control genomes (chr14:5,731,405T>G or MKLN/:c.400+3A>C). This variant showed perfect association in a larger combined Bull Terrier/Miniature Bull Terrier cohort of 46 cases and 294 controls. The variant was absent from 462 genetically diverse control dogs of 62 other dog breeds. RT-PCR analysis of skin RNA from an affected and a control dog demonstrated skipping of exon 4 in the MKLN1 transcripts of the LAD affected dog, which leads to a shift in the MKLN1 reading frame. MKLN1 encodes the widely expressed intracellular protein muskelin 1, for which diverse functions in cell adhesion, morphology, spreading, and intracellular transport processes are discussed. While the pathogenesis of LAD remains unclear, our data facilitate genetic testing of Bull Terriers and Miniature Bull Terriers to prevent the unintentional production of LAD affected dogs. This study may provide a starting point to further clarify the elusive physiological role of muskelin 1 in vivo.Peer reviewe

    Insights into the genetic epidemiology of Crohn's and rare diseases in the Ashkenazi Jewish population

    Get PDF
    As part of a broader collaborative network of exome sequencing studies, we developed a jointly called data set of 5,685 Ashkenazi Jewish exomes. We make publicly available a resource of site and allele frequencies, which should serve as a reference for medical genetics in the Ashkenazim (hosted in part at https://ibd.broadinstitute.org, also available in gnomAD at http://gnomad.broadinstitute.org). We estimate that 34% of protein-coding alleles present in the Ashkenazi Jewish population at frequencies greater than 0.2% are significantly more frequent (mean 15-fold) than their maximum frequency observed in other reference populations. Arising via a well-described founder effect approximately 30 generations ago, this catalog of enriched alleles can contribute to differences in genetic risk and overall prevalence of diseases between populations. As validation we document 148 AJ enriched protein-altering alleles that overlap with "pathogenic" ClinVar alleles (table available at https://github.com/macarthur-lab/clinvar/blob/master/output/clinvar.tsv), including those that account for 10-100 fold differences in prevalence between AJ and non-AJ populations of some rare diseases, especially recessive conditions, including Gaucher disease (GBA, p.Asn409Ser, 8-fold enrichment); Canavan disease (ASPA, p.Glu285Ala, 12-fold enrichment); and Tay-Sachs disease (HEXA, c.1421+1G>C, 27-fold enrichment; p.Tyr427IlefsTer5, 12-fold enrichment). We next sought to use this catalog, of well-established relevance to Mendelian disease, to explore Crohn's disease, a common disease with an estimated two to four-fold excess prevalence in AJ. We specifically attempt to evaluate whether strong acting rare alleles, particularly protein-truncating or otherwise large effect-size alleles, enriched by the same founder-effect, contribute excess genetic risk to Crohn's disease in AJ, and find that ten rare genetic risk factors in NOD2 and LRRK2 are enriched in AJ (p < 0.005), including several novel contributing alleles, show evidence of association to CD. Independently, we find that genomewide common variant risk defined by GWAS shows a strong difference between AJ and non-AJ European control population samples (0.97 s.d. higher, p<10-16). Taken together, the results suggest coordinated selection in AJ population for higher CD risk alleles in general. The results and approach illustrate the value of exome sequencing data in case-control studies along with reference data sets like ExAC (sites VCF available via FTP at ftp.broadinstitute.org/pub/ExAC_release/release0.3/) to pinpoint genetic variation that contributes to variable disease predisposition across populations

    Variant curation expert panel recommendations for RYR1 pathogenicity classifications in malignant hyperthermia susceptibility

    Get PDF
    Purpose: As a ClinGen Expert Panel (EP) we set out to adapt the American College of Medical Genetics and Genomics (ACMG)/Association for Molecular Pathology (AMP) pathogenicity criteria for classification of RYR1 variants as related to autosomal dominantly inherited malignant hyperthermia (MH). Methods: We specified ACMG/AMP criteria for variant classification for RYR1 and MH. Proposed rules were piloted on 84 variants. We applied quantitative evidence calibration for several criteria using likelihood ratios based on the Bayesian framework. Results: Seven ACMG/AMP criteria were adopted without changes, nine were adopted with RYR1-specific modifications, and ten were dropped. The in silico (PP3 and BP4) and hotspot criteria (PM1) were evaluated quantitatively. REVEL gave an odds ratio (OR) of 23:1 for PP3 and 14:1 for BP4 using trichotomized cutoffs of ≄0.85 (pathogenic) and ≀0.5 (benign). The PM1 hotspot criterion had an OR of 24:1. PP3 and PM1 were implemented at moderate strength. Applying the revised ACMG/AMP criteria to 44 recognized MH variants, 29 were classified as pathogenic, 13 as likely pathogenic, and 2 as variants of uncertain significance. Conclusion: Curation of these variants will facilitate classification of RYR1/MH genomic testing results, which is especially important for secondary findings analyses. Our approach to quantitatively calibrating criteria is generalizable to other variant curation expert panels
    • 

    corecore